Goto

Collaborating Authors

 base classifier



Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound

Neural Information Processing Systems

We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective. The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PACBayes objectives - both with uninformed (data-independent) and informed (datadependent) priors.


A Boosting-Type Convergence Result for AdaBoost.MH with Factorized Multi-Class Classifiers

Neural Information Processing Systems

AdaBoost is a well-known algorithm in boosting. Schapire and Singer propose, an extension of AdaBoost, named AdaBoost.MH, for multi-class classification problems. Kรฉgl shows empirically that AdaBoost.MH works better when the classical one-against-all base classifiers are replaced by factorized base classifiers containing a binary classifier and a vote (or code) vector. However, the factorization makes it much more difficult to provide a convergence result for the factorized version of AdaBoost.MH. Then, Kรฉgl raises an open problem in COLT 2014 to look for a convergence result for the factorized AdaBoost.MH. In this work, we resolve this open problem by presenting a convergence result for AdaBoost.MH with factorized multi-class classifiers.


Supplementary Material for Understanding and Improving Ensemble Adversarial Defense

Neural Information Processing Systems

They are used to test the proposed enhancement approach iGA T. In general, ADP employs an ensemble by averaging, i.e., (C 1) ( C 1) Adversarial examples are generated to compute the losses by using the PGD attack. Our main theorem builds on a supporting Lemma 2.1. We start from the cross-entropy loss curvature measured by Eq. The above new expression of T (x) helps bound the difference between h(x) and h(x). Note that these three cases are mutually exclusive.






A Derivation details under Dirichlet assumptions

Neural Information Processing Systems

We finally report the detailed comparison on real benchmarks in Figure 14 and Tables 1 and 2. 21 Figure 13: Comparison voter strength (1st column), training error (2nd column), test error (3rd


LearningStochasticMajorityVotesby MinimizingaPAC-BayesGeneralizationBound

Neural Information Processing Systems

While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows foraclosed-form anddifferentiable expression fortheexpected risk, which then turns the generalization bound into a tractable training objective.